Goto

Collaborating Authors

 experimental model


Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Ikeda, Wataru, Yano, Kazuki, Takahashi, Ryosuke, Lee, Jaesung, Shibata, Keigo, Suzuki, Jun

arXiv.org Artificial Intelligence

This study investigates the layerwise importance of feed-forward networks (FFNs) in Transformer-based language models during pretraining. We introduce an experimental approach that, while maintaining the total parameter count, increases the FFN dimensions in some layers and completely removes the FFNs from other layers. Furthermore, since our focus is on the importance of FFNs during pretraining, we train models from scratch to examine whether the importance of FFNs varies depending on their layer positions, rather than using publicly available pretrained models as is frequently done. Through comprehensive evaluations of models with varying sizes (285M, 570M, and 1.2B parameters) and layer counts (12, 24, and 40 layers), we demonstrate that concentrating FFNs in 70% of the consecutive middle layers consistently outperforms standard configurations for multiple downstream tasks.


Structural Reformation of Large Language Model Neuron Encapsulation for Divergent Information Aggregation

Bakushev, Denis, Boultinghouse, Gideon, Oppenheimer, Harriet, Gillingwater, Sebastian, Ashington, Valentina, Stanborough, Wilfred

arXiv.org Artificial Intelligence

Structured neuron encapsulation introduces a modular framework that enables more effective aggregation and specialization of information within deep learning architectures. A model modified through this framework demonstrated improved perplexity scores, greater lexical variability, and enhanced consistency in logical reasoning, suggesting that structured parameter distribution contributes to more efficient language representation. Statistical analyses of generated text highlighted a wider range of sentence structures and reduced redundancy in token selection, indicating that encapsulation fosters more adaptable language generation. A detailed evaluation of attention weight distributions revealed that the experimental model exhibited greater divergence in cross-layer activations, supporting the hypothesis that encapsulated neurons assume specialized processing roles. Logical consistency assessments further demonstrated that modular architectures mitigate contradictory outputs, reducing internal conflicts in inferred relationships between linguistic constructs. Computational trade-offs were analyzed, with results showing a minor increase in processing overhead, though improvements in parameter efficiency and structured decision-making compensated for the additional complexity. The mathematical formulation of the encapsulation mechanism confirmed that modular aggregation maintains stable convergence properties while promoting distinct functional roles for different neuron clusters.


The control architecture of a spherical robot for Minimally Invasive Surgery

Rus, Gabriela, Hajjar, Nadim Al, Tucan, Paul, Zima, Ionut, Vaida, Calin, Radu, Corina, Jucan, Daniel, Chablat, Damien, Pisla, Doina

arXiv.org Artificial Intelligence

Control systems used in Minimally Invasive Surgery (MIS) play a crucial role in ensuring preci-sion and safety throughout procedures. This paper presents a control architecture developed for a robotic system designed for MIS operations. The modular structure of the control system allows for compatibility with a range of procedures in abdominal and thoracic regions. The proposed control system, employing the master-slave concept, is presented alongside the experimental model. Functional validation is obtained by performing a Siemens NX simulation and comparing the results with several experimental runs using the experimental model of the robot. With its compact size and stiffness, the system holds promise for integration with other robotic systems. Future efforts will be dedicated to exploring and optimizing this potential collaboration to enhance the overall capabilities of robotic-assisted surgery.


Accuracy and repeatability of a parallel robot for personalised minimally invasive surgery

Pisla, Doina, Tucan, Paul, Chablat, Damien, Hajjar, Nadim Al, Ciocan, Andra, Pisla, Adrian, Pusca, Alexandru, Radu, Corina, Pop, Grigore, Gherman, Bogdan

arXiv.org Artificial Intelligence

The paper presents the methodology used for accuracy and repeatability measurements of the experimental model of a parallel robot developed for surgical applications. The experimental setup uses a motion tracking system (for accuracy) and a high precision measuring arm for position (for repeatability). The accuracy was obtained by comparing the trajectory data from the experimental measurement with a baseline trajectory defined with the kinematic models of the parallel robotic system. The repeatability was experimentally determined by moving (repeatedly) the robot platform in predefined points. Keywords: parallel robot, robotic assisted surgery, measurement, accuracy, repeatability.


Era Splitting -- Invariant Learning for Decision Trees

DeLise, Timothy

arXiv.org Artificial Intelligence

Real-life machine learning problems exhibit distributional shifts in the data from one time to another or from on place to another. This behavior is beyond the scope of the traditional empirical risk minimization paradigm, which assumes i.i.d. distribution of data over time and across locations. The emerging field of out-of-distribution (OOD) generalization addresses this reality with new theory and algorithms which incorporate environmental, or era-wise information into the algorithms. So far, most research has been focused on linear models and/or neural networks. In this research we develop two new splitting criteria for decision trees, which allow us to apply ideas from OOD generalization research to decision tree models, including random forest and gradient-boosting decision trees. The new splitting criteria use era-wise information associated with each data point to allow tree-based models to find split points that are optimal across all disjoint eras in the data, instead of optimal over the entire data set pooled together, which is the default setting. In this paper we describe the problem setup in the context of financial markets. We describe the new splitting criteria in detail and develop unique experiments to showcase the benefits of these new criteria, which improve metrics in our experiments out-of-sample. The new criteria are incorporated into the a state-of-the-art gradient boosted decision tree model in the Scikit-Learn code base, which is made freely available.


Symbolic Synthesis of Neural Networks

Whitehouse, Eli

arXiv.org Artificial Intelligence

Neural networks adapt very well to distributed and continuous representations, but struggle to generalize from small amounts of data. Symbolic systems commonly achieve data efficient generalization by exploiting modularity to benefit from local and discrete features of a representation. These features allow symbolic programs to be improved one module at a time and to experience combinatorial growth in the values they can successfully process. However, it is difficult to design a component that can be used to form symbolic abstractions and which is adequately overparametrized to learn arbitrary high-dimensional transformations. I present Graph-based Symbolically Synthesized Neural Networks (G-SSNNs), a class of neural modules that operate on representations modified with synthesized symbolic programs to include a fixed set of local and discrete features. I demonstrate that the choice of injected features within a G-SSNN module modulates the data efficiency and generalization of baseline neural models, creating predictable patterns of both heightened and curtailed generalization. By training G-SSNNs, we also derive information about desirable semantics of symbolic programs without manual engineering. This information is compact and amenable to abstraction, but can also be flexibly recontextualized for other high-dimensional settings. In future work, I will investigate data efficient generalization and the transferability of learned symbolic representations in more complex G-SSNN designs based on more complex classes of symbolic programs. Experimental code and data are available at https://github.com/shlomenu/symbolically_synthesized_networks .


Compiler Provenance Recovery for Multi-CPU Architectures Using a Centrifuge Mechanism

Otsubo, Yuhei, Otsuka, Akira, Mimura, Mamoru

arXiv.org Artificial Intelligence

Bit-stream recognition (BSR) has many applications, such as forensic investigations, detection of copyright infringement, and malware analysis. We propose the first BSR that takes a bare input bit-stream and outputs a class label without any preprocessing. To achieve our goal, we propose a centrifuge mechanism, where the upstream layers (sub-net) capture global features and tell the downstream layers (main-net) to switch the focus, even if a part of the input bit-stream has the same value. We applied the centrifuge mechanism to compiler provenance recovery, a type of BSR, and achieved excellent classification. Additionally, downstream transfer learning (DTL), one of the learning methods we propose for the centrifuge mechanism, pre-trains the main-net using the sub-net's ground truth instead of the sub-net's output. We found that sub-predictions made by DTL tend to be highly accurate when the sub-label classification contributes to the essence of the main prediction.


Transfer learning driven design optimization for inertial confinement fusion

Humbird, K. D., Peterson, J. L.

arXiv.org Artificial Intelligence

Transfer learning is a promising approach to creating predictive models that incorporate simulation and experimental data into a common framework. In this technique, a neural network is first trained on a large database of simulations, then partially retrained on sparse sets of experimental data to adjust predictions to be more consistent with reality. Previously, this technique has been used to create predictive models of Omega and NIF inertial confinement fusion (ICF) experiments that are more accurate than simulations alone. In this work, we conduct a transfer learning driven hypothetical ICF campaign in which the goal is to maximize experimental neutron yield via Bayesian optimization. The transfer learning model achieves yields within 5% of the maximum achievable yield in a modest-sized design space in fewer than 20 experiments. Furthermore, we demonstrate that this method is more efficient at optimizing designs than traditional model calibration techniques commonly employed in ICF design. Such an approach to ICF design could enable robust optimization of experimental performance under uncertainty.


Analyzing Multispectral Satellite Imagery of South American Wildfires Using Deep Learning

Sun, Christopher

arXiv.org Artificial Intelligence

Since frequent severe droughts are lengthening the dry season in the Amazon Rainforest, it is important to detect wildfires promptly and forecast possible spread for effective suppression response. Current wildfire detection models are not versatile enough for the low-technology conditions of South American hot spots. This deep learning study first trains a Fully Convolutional Neural Network on Landsat 8 images of Ecuador and the Galapagos, using Green and Short-wave Infrared bands to predict pixel-level binary fire masks. This model achieves a 0.962 validation F2 score and a 0.932 F2 score on test data from Guyana and Suriname. Afterward, image segmentation is conducted on the Cirrus band using K-Means Clustering to simplify continuous pixel values into three discrete classes representing differing degrees of cirrus cloud contamination. Three additional Convolutional Neural Networks are trained to conduct a sensitivity analysis measuring the effect of simplified features on model accuracy and train time. The Experimental model trained on the segmented cirrus images provides a statistically significant decrease in train time compared to the Control model trained on raw cirrus images, without compromising binary accuracy. This proof of concept reveals that feature engineering can improve the performance of wildfire detection models by lowering computational expense.


DeepMind's AI predicts structures for a vast trove of proteins

#artificialintelligence

The human mediator complex has long been one of the most challenging multi-protein systems for structural biologists to understand.Credit: Yuan He The human genome holds the instructions for more than 20,000 proteins. But only about one-third of those have had their 3D structures determined experimentally. And in many cases, those structures are only partially known. Now, a transformative artificial intelligence (AI) tool called AlphaFold, which has been developed by Google's sister company DeepMind in London, has predicted the structure of nearly the entire human proteome (the full complement of proteins expressed by an organism). In addition, the tool has predicted almost complete proteomes for various other organisms, ranging from mice and maize (corn) to the malaria parasite (see'Folding options').